Project-Team:SEQUEL

Inria | Raweb 2017 | Presentation of the Project-Team SEQUEL | SEQUEL Web Site


	PDF	e-Pub

Previous |

Home | Next next

Section: New Software and Platforms

BAC

Bayesian Policy Gradient and Actor-Critic Algorithms

Keywords: Machine learning - Incremental learning - Policy Learning

Functional Description: To address this issue, we proceed to supplement our Bayesian policy gradient framework with a new actor-critic learning model in which a Bayesian class of non-parametric critics, based on Gaussian process temporal difference learning, is used. Such critics model the action-value function as a Gaussian process, allowing Bayes’ rule to be used in computing the posterior distribution over action-value functions, conditioned on the observed data. Appropriate choices of the policy parameterization and of the prior covariance (kernel) between action-values allow us to obtain closed-form expressions for the posterior distribution of the gradient of the expected return with respect to the policy parameters. We perform detailed experimental comparisons of the proposed Bayesian policy gradient and actor-critic algorithms with classic Monte-Carlo based policy gradient methods, as well as with each other, on a number of reinforcement learning problems.

Contact: Michal Valko
URL: https://team.inria.fr/sequel/Software/BAC/

Previous |

Home | Next next